Can Common Crawl reliably track persistent identifier (PID) use over time?

نویسندگان

  • Henry S. Thompson
  • Jian Tong
چکیده

We report here on the results of two studies using two and four monthly web crawls respectively from the Common Crawl (CC) initiative between 2014 and 2017, whose initial goal was to provide empirical evidence for the changing paŠerns of use of so-called persistent identi€ers. Œis paper focusses on the tooling needed for dealing with CC data, and the problems we found with it. Œe €rst study is based on over 1012 URIs from over 5x109 pages crawled in April 2014 and April 2017, the second study adds a further 3x109 pages from the April 2015 and April 2016 crawls. We conclude with suggestions on speci€c actions needed to enable studies based on CC to give reliable longitudinal information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Analysis of Discrete-Time Repetitive Control for Scanning Probe Microscopes

This paper studies repetitive control (RC) with linear phase lead compensation to precisely track periodic trajectories in piezo-based scanning probe microscopes (SPMs). Quite often, the lateral scanning motion in SPMs during imaging or nanofabrication is periodic. Dynamic and hysteresis effects in the piezoactuator cause significant tracking error. To minimize the tracking error, commercial SP...

متن کامل

Cough Syrup Use in Infants, A Dangerous Practice

Sleep disturbance is a very common finding in patients with persistent cough1. This is especially apparent in infants with common cold or influenza and the continuous crying can lead to considerable distress for the parents. Cough syrups which are easily available as over the counter medications can induce sleep, and many parents turn to this medication when their infant is suffering from persi...

متن کامل

Sliding Mode Control with Predictive PID Sliding Surface for Improved Performance pdfkeywords=Sliding Mode Control, Sliding surface, Predictive PID, GPC

In this paper, a sliding mode control system with a predictive proportional-integral-derivative (PPID-SMC) sliding surface is proposed. A robust sliding mode controller is suggested to track the desired trajectory despite uncertainty, set point variations, and external disturbances. The proposed sliding mode controller is chosen to ensure the stability of overall dynamics during the reaching ph...

متن کامل

Dynamic Modeling, Assembly and implementing Quadrotor UAV Using PID Controller

in the past decade, paying attention to the vertical fliers has been noted by most of the scientist and researchers, because of their exclusive features. The special capabilities of these, reducing radar identifier, low risk for human life, no restrictions on size and uses such as photography, survey, press coverage, checking, power lines, meteorological analysis, traffic, monitoring, in urban ...

متن کامل

Archiving Temporal Web Information: Organization of Web Contents for Fast Access and Compact Storage

We address the problem of archiving dynamic web contents over significant time spans. Current schemes crawl the web contents at regular time intervals and archive the contents after each crawl regardless of whether or not the contents have changed between consecutive crawls. Our goal is to store newly crawled web contents only when they are different than the previous crawl, while ensuring accu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.01424  شماره 

صفحات  -

تاریخ انتشار 2018